Speeding-up the kernel k-means clustering method: A prototype based hybrid approach

نویسندگان

  • T. Hitendra Sarma
  • P. Viswanath
  • B. Eswara Reddy
چکیده

Kernel k-means clustering method has been proved to be effective in identifying non-isotropic and linearly inseparable clusters in the input space. However, this method is not a suitable one for large data-sets because of its quadratic time complexity with respect to the size of the data-set. This paper presents a simple prototype based hybrid approach to speed-up the kernel k-means method for large data-sets. The proposed method works in two stages. First, the data-set is partitioned into a number of small grouplets by using the leaders clustering method. Each group-let is represented by a prototype called its leader. The conventional leaders clustering method is modified such that these group-lets are formed in the kernel induced feature space. The data-set is re-indexed according to these group-lets. Later, kernel k-means clustering method is applied over the set of leaders to derive a partition of the leaders set. Finally, each leader is replaced by its group to get a partition of the entire data-set. The time complexity of the proposed method ∗Corresponding Author, Tel. No. +91-9493239032, Fax: +91-8514-275123 Preprint submitted to Pattern Recognition Letters July 26, 2011 is O(n+p), where p is the size of the leaders set. Its space complexity is also O(n + p). Experimental studies with data-sets of varying sizes shows that with a small loss of quality, the proposed algorithm can significantly reduce the computation time, particularly for large data-sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speeding-Up the K-Means Clustering Method: A Prototype Based Approach

The paper is about speeding-up the k-means clustering method which processes the data in a faster pace, but produces the same clustering result as the k-means method. We present a prototype based method for this where prototypes are derived using the leaders clustering method. Along with prototypes called leaders some additional information is also preserved which enables in deriving the k mean...

متن کامل

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

A hybrid DEA-based K-means and invasive weed optimization for facility location problem

In this paper, instead of the classical approach to the multi-criteria location selection problem, a new approach was presented based on selecting a portfolio of locations. First, the indices affecting the selection of maintenance stations were collected. The K-means model was used for clustering the maintenance stations. The optimal number of clusters was calculated through the Silhou...

متن کامل

Tabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach

  The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...

متن کامل

GROUND MOTION CLUSTERING BY A HYBRID K-MEANS AND COLLIDING BODIES OPTIMIZATION

Stochastic nature of earthquake has raised a challenge for engineers to choose which record for their analyses. Clustering is offered as a solution for such a data mining problem to automatically distinguish between ground motion records based on similarities in the corresponding seismic attributes. The present work formulates an optimization problem to seek for the best clustering measures. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Pattern Recognition Letters

دوره 34  شماره 

صفحات  -

تاریخ انتشار 2013